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Most of the text algorithms build data structures on words, mainly trees, as digital trees (tries) or 
binary search trees (bst). The mechanism which produces symbols of the words (one symbol at each 
unit time) is called a source, in information theory contexts. The probabilistic behaviour of the trees 
built on words emitted by the same source depends on two factors: the algorithmic properties of the 
tree, together with the information-theoretic properties of the source. Very often, these two factors 
are considered in a too simplified way: from the algorithmic point of view, the cost of the Bst is only 
measured in terms of the number of comparisons between words -from the information theoretic 
point of view, only simple sources (memoryless sources or Markov chains) are studied. 
We wish to perform here a realistic analysis, and we choose to deal together with a general source 
and a realistic cost for data structures: we take into account comparisons between symbols, and 
we consider a general model of source, related to a dynamical system, which is called a dynamical 
source. Our methods are close to analytic combinatorics, and our main object of interest is the 
generating function of the source A{s), which is here of Dirichlet type. Such an object transforms 
probabilistic properties of the source into analytic properties. The tameness of the source, which is 
defined through analytic properties of A(s), appears to be central in the analysis, and is precisely 
studied for the class of dynamical sources. We focus here on arithmetical conditions, of diophantine 
type, which are sufficient to imply tameness on a domain with hyperbolic shape. 

Plan of the paper. We first recall in Section 1 general facts on sources and trees, and define tlie prob- 
abilistic model chosen for the analysis. Then, we provide the statements of the main two theorems 
(Theorem 1 and 2) which establish the possible probabilistic behaviour of trees, provided that the source 
be tame. The tameness notions are defined in a general framework and then studied in the case of simple 
sources (memoryless sources and Markov chains). In Section 2, we focus on a general model of sources, 
the dynamical sources, that contains as a subclass the simple sources. We present sufficient conditions 
on these sources under which it is possible to prove tameness. We compare these tameness properties to 
those of simple sources, and exhibit both resemblances and differences between the two classes. 

1 Probabilistic behaviour of trees built on general sources. 

1.1. General sources. Throughout this paper, an ordered (possibly infinite denumerable) alphabet £ := 
{ai,a2, ■ ■ ■ ,£?,-} is fixed. 

A probabilistic source, which produces infinite words of is specified by the set {pw,w G £*} of 
fundamental probabilities p^, where is the probability that an infinite word begins with the finite 
prefix w. It is furthermore assumed that Uk := sup{/7^^. : w e L^} tends to 0, as ^ — )• oo. 
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As it is usual in the domain of analytic combinatorics, well described in lITTI . our analyses involve the 
generating function of the source, here of Dirichlet type, first introduced in ||25]| and defined as 

A(.):=£K, A(,)(s):=£K. (1) 

wel,* weE* 

Since all the equalities A(^)(l) = 1 hold, the series A{s) is divergent at ^ = 1, and the probabilistic 
properties of the source can be expressed in terms of the regularity of A near s = 1 , as it is known from 
previous works |[25l and will be recalled later. For instance, the entropy h{j>^) relative to a probabilistic 
source y is defined as the limit (if it exists) that involves the previous Dirichlet series 

h{y) ■= lim P^'^OEPw = lim -^^A(^)(5)|^_,. (2) 

i^oo k ~i k^«, k as ^ ' ' 

1.2. Simple sources: memoryless sources and Markov chains. A memoryless source, associated to 
the (possibly infinite) alphabet £, is defined by the set {pj)jeY. of probabilities, and the Dirichlet series 
A,A(^) are expressed with 

X{s) = Y,p1, under the form A^^){s) = X{s)\ A(^) = — i— . (3) 

A Markov chain associated to the finite alphabet E, is defined by the vector R of initial probabilities 
{ri)ieT. together with the transition matrix P := [{Pi\j){i.j)ei:xi]- We denote by P{s) the matrix with 
general coefficient p-J^j, and by R{s) the vector of components rj. Then 

Ais) = \+'l-{I-P{s))-^-R{s). (4) 

If, moreover, the matrix P is irreducible and aperiodic, then, for any real s, the matrix P{s) has a unique 
dominant eigenvalue A {s) . 

In both cases, the entropy satisfies h{j^) = — A'(l). 

1.3. Tlie first main data structure: tlie trie. A trie is a tree structure which is used as a dictionary 
in various applications, as partial match queries, text processing tasks or compression. This justifies 
considering the trie structure as one of the central general purpose data structures of Computer Science. 
See lfT3l or ll23l for an algorithmic study of this structure. 

The trie structure compares words via their prefixes: it is based on a splitting according to symbols 
encountered. If ^ is a set of (infinite) words over £, then the trie associated to ^ is defined recur- 
sively by the rule: Trie(^) is an internal node where are attached the tries Trie(,^ \fl'i),Trie(^ \ 
(22), • • . ,Trie(^ Here, the set =^ \a denotes the subset of ^ consisting of strings that start with 
the symbol a stripped of their initial symbol a; recursion is halted as soon as ^ contains less than two 
elements: if =^ is empty, then Trie(,^) is empty; if ^ has only one element X, then Trie(^) is a 
leaf labelled withX. 

For = n, the trie Trie(^) has exactly n branches, and the depth of a branch is the number of 
(internal) nodes that it contains. The path-length equals the sum of the depth of all branches: this is the 
total number of symbols that need to be examined in order to distinguish all elements of Divided by 
the number of elements, it is also by definition the cost of a positive search (i.e. searching for a word 
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that is present in the trie). The size of the tree is the number of its internal nodes. Adding to the size, 
the cardinality of ^ gives the number of prefixes necessary to isolate all elements of It gives also 
a precise estimate of the place needed in memory to store the trie in a real-life implementation. In this 
paper, we focus on two trie parameters: the size and the path-length. 

1.4. The second main data structure: the binary search tree (Est). We revisit here this well- 
known structure. Usually, this kind of tree contains keys and the path length of this tree measures the 
number of key comparisons that are needed to build the tree, and sort the keys by a method closely 
related to Quicksort. This usual cost -the number of key comparisons- is not realistic when the keys 
have a complex structure, in the context of data bases or natural languages, for instance. In this case, 
it is more convenient to view a key as a word, and now, the cost for comparing two words (in the 
lexicographic order) is closely related to the length of their largest common prefix, called the coincidence. 
The convenient cost of the bst is then the total number of symbol comparisons between words that are 
needed to build it; this is a kind of a weighted path length, called the symbol path-length of the Bst, also 
equal to the total symbol cost of Quicksort. For instance, for inserting the key F in the Bst of Figure 
1, the number of key comparisons equals 3, whereas the number of symbol comparisons equals 18 (7 for 
comparing F to A, 8 for comparing F to B and 1 for comparing F to C). This is this symbol path length 
that is studied in the following. 




Figure 1: On the left, a trie built on sixteen words of {a,/?}*. On the right, a binary search tree built on seven 
words of {a,b}*. 

1.5. Average-case analysis: exact expressions of the three mean costs. The average-case analysis of 
structures (or algorithms) aims characterizing the mean value of their parameters under a well-defined 
probabilistic model that describes the initial distribution of its inputs. Here, we adopt the following quite 
general model: we work with a finite sequence ^ of infinite words independently produced by the same 
general source S^, and we wish to estimate the mean value of the parameters when the cardinality n of 
^ becomes large. Here, in the paper, we focus on three main parameters, two for Trie(^') and one for 
Bst(^). When restricted to simple sources, there exist many works that study the trie parameters (see 
S [Ml \T5l |24l) or the symbol path length for Bst (see HI). The same studies, in the case of a general 
source, are done in llH for the Trie and in \\2S\ for the Bst, and are summarized as follows: 
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Theorem 1 [Clement, Fill, Flajolet, Vallee]. Let ,y be a general source. Consider a finite sequence ^ 
of n infinite words independently produced by 5^. Then the expectations of the size R of'Trie{,^^), the 
path length C o/Trie(<^), the symbol path— length B of the binary search tree Bst(^) are all expressed 
under the form 

T{n) = l^{-\f{^^mT{k), (5) 

where the function [s) is a Dirichlet series which depends on the parameter T and is closely related 
to the Dirichlet series A.{s) of the source 5^ , defined in ([7| 

mR{s) = {s-\)K{s), mc{s)=sK{s), mB{s) = ^^_^^ K{s). (6) 

This result provides exact expressions for the mean values of parameters of interest, that are totally 
explicit for simple sources, due to formulae given in Q or in Q. As we now wish to obtain an asymptotic 
form for these mean values, these nice exact expressions are not easy to deal with, due to the presence 
of the alternate sum. The Rice formula, described in |[20ll2T]| and introduced by Flajolet and Sedgewick 
ifTOl into the analytic combinatorics domain, transforms an alternate sum into an integral of the complex 
plane, provided that the sequence of numerical values U5{k) lifts into an analytic function U5{s). 

Let T(n) be a numerical sequence which can be written as in Q, where the function (Dt{s) is analytic 
in 3i{s) > C, with 1 < C < 2, and is there of polynomial growth with order at most r. Then the sequence 
T{n) admits a Norlund-Rice representation, for n> r+l and any C <d <2. 

1 f-d+ioo ^1 

T{n) = — a^Ti-s)-, 7^^^ zds (7) 

^' liKj-d-i^ ^ ' s{s+l)---{s + n) 

1.6. Importance of tameness of sources. The idea is now to push the contour of integration in (|7]l to 
the right, past — 1 . This is why we consider the possible behaviours for the function UJj {s) near 9^5 = 1 , 
more precisely on the left of the line 9^^^ = 1. Due to the close relations between the functions U5t{s) 
and the Dirichlet series A.{s) of the source given in ([6]), it is sufficient to consider possible behaviours for 
A(5') itself. We will later show why the behaviours that are described in the following definition, already 
given in f26|M and shown in Figure 2, arise in a natural way for a large class of sources. 



Definition 1 Let be a region that contains the half-plane '^s > 1. 

A source 5^ is M-entropic if A(s) is meromorphic on M with a simple pole at s = \, simple, whose 
residue involves the entropy h{,y) under the form 1 /h{y). 

A source is S^-tame if (/) it is £?—entropic, — (ii) A(s) has no other pole than s = \ in - (Hi) A(s) is 
of polynomial growth in as \s\ — )• +oo. 

A source is 

(a) strongly-tame (S—tame in shorthand) of abscissa 5 if there exists a vertical strip of the form 
'^{s) > 1 — 5, with 5 > 0, where A{s) is Si- tame. 

{b) hyperbolically tame (H—tame in shorthand) of exponent a if there exists a hyperbolic region with 
A,B,a > 

= o + it- \t\>B, o>\-^\\jls = o + it; o>l-^,\t\<B 



There are slight differences between the two definitions but the "spirit" is the same. 
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where A.{s) is !%-tame. 

A source 5^ is periodic of abscissa d, if there exists a vertical strip ^ of the form 3i{s) > 1—5, with 
5 >0, where A{s) is ^ entropic and admits a singularity at a point 1 + itQ, for some real fo > 0|^ 

For an entropic source, the Dirichlet series (Uri^) has a pole of order (for the Trie size, cost R), a pole 
of order 1 (for the Trie path length, cost C), a pole of order 2 (for the Est symbol path length, cost B). 




Figure 2: Three possible domains where the function (n{s) is analytic and of polynomial growth. 



1.7. Average-case analysis: asymptotic expressions of the three mean costs. Now, the following 
result shows that the shape of the tameness region (described by the order, the abscissa, the exponent) 
essentially determine the behaviour of the Rice integral in Q, and thus the asymptotic behaviour of our 
main parameters of interest: It provides a dictionary which transfers the tameness properties of the source 
into asymptotic properties of the sequence T{n). The following theorem gathers and makes more precise 
results that are already obtained in Q or ||26]| : 



Theorem 2 [Clement, Fill, Flajolet, Vallee]. The asymptotics of each cost T{n) of interest, relative to a 
parameter of a tree built on a general source y, and defined in Theorem 1, is of the general following 
form T(n) = Prin) +E{n). The "principal term" Prin) involves the entropy h{y) under the form 



PR{n) 



1 



Pc{n) 



1 



-nlogn + an, 



Psin) 



1 



-nlog n + bnlogn + cn, 



together with some other constants a,b,c. The "error term" E{n) admits the possible following forms, 
depending on the tameness of the source 

(a) If is S-tame with abscissa 5o, then E{n) = 0{n^^^), for any 5 < 6q. 

(b) If 5^ is H-tame with exponent Oq, then E{n) = n ■ 0(exp[— (log?i)"])/or any a < 1/ (ao + 1)- 

(c) If^ is periodic with abscissa 5o, then E{n) = n ■ <I>(log?i) + 0{n^^^),for any 5 < 6o, 

where n ■ <t>{logn) is the part of the expansion brought by the family of the non real poles located on the 
vertical line Sis = 1, and involves a periodic function <I>. 

Note that the "error term" E{n) is not always ... an actual error term: in the case of the trie size, for a 
periodic source, the fluctuation terms given by E{n) arise in the main term. However, in all the other 
cases, the term E{n) is indeed an error term. The main term of the principal term always involves a 
constant equal to l/h{y), and the order of the main term depends on the tree pai^ameter: it is always of 
the form nlog'^n, and the integer k equals the order of the pole s = \ for the Dirichlet series GJ7-(s) : one 



^This implies that GT(i) admits singularities at all the points 1 + iktQ for any integer k, and is of polynomial growth on a 
family of horizontal lines f = with fj. £», and on vertical lines 3i(s) = 1 — 5' with some 5' < 8 
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has ^ = for the Trie size, ^ = 1 for the Trie path length, and ^ = 2 for the Bst symbol path length. 
This result proves that, with respect to the number of symbol comparisons, the Bst is much less efficient 
than the Trie. 

1.8. Tameness of simple sources. We show that tameness properties that are described in Definition 1 
arise in a natural way for simple sources. Even if S-tameness never occurs for simple sources, we will see 
later that it "often" occurs for most of more "complex" sources. We now focus on the memoryless case, 
defined by the probabihties *p = {p\,p2,... ,Pr), to which we associate the ratios atj := log pj /log pt- 
Then, tameness properties depend on arithmetic properties of the ratios akj. 

Proposition 1 Any simple source (memoryless source or irreducible aperiodic Markov chain) is en- 
tropic. A memoryless source is periodic if and only, for any fixed k, all the real numbers atj are rationals 
with the same denominator. 

We now focus on non-periodic memoryless sources, where there exists, amongst all the reals auj, at 
least one real a^- j which is irrational. In this case, there is no other pole of A.{s) than = 1 on the vertical 
line 91* = 1 but there exist poles of A which are arbitrary close to the vertical line 9?^' = 1 . This entails 
that a simple source is never strongly tame. The distribution of distances of the poles with respect to 
the vertical line 9^5 = 1 depends on the degree of approximability of the family a by rationals, as it was 
first remarked in Q. We recall some notions on diophantine approximations (see for instance lfT6l ). The 
irrationality exponent of a real x is defined by 



A number x is diophantine if its irrationality exponent is finite. The following result provides a charac- 
terisation of H-tameness for simple sources. It can be found in a more precise form in lfT2i . where the 
authors revisit previous results of Iil7il . 

Theorem 3 [Flajolet, Roux, Vallee]. A memoryless source is H—tame if and only it is diophantine. 
Moreover, there is a relation between the exponent a of H-tameness and the irrationality exponent 
jxi^): one can choose as a any real strictly greater than 2}x{^) + 2, and it is in a precise sense the best 
possible choice. 

With the general Theorem 2, together with Propositions 1 and 2, we can precisely describe the asymptotic 
probabilistic behaviour of two main tree data structures built on words produced by memoryless sources. 
Generally speaking. Theorem 2 can be applied to tree structures built on a general source as soon as its 
tameness may be studied. The following of the paper describes a general class of sources, which contains 
the simple sources, for which tameness properties can be precisely studied. We will see that tameness of 
these general sources may be quite different from tameness of simple sources. 

2 Tameness of dynamical sources. 

We first define the class of dynamical sources and explain their relation with simple sources. Then, 
we recall the expression of the Dirichlet series A.{s) as a function of the secant transfer operator of 
the underlying dynamical systems. Finally, we exhibit sufficient conditions on the underlying dynamical 
system under which it is possible to prove tameness properties [Theorem 4 for S-tameness, and Theorem 
5 for H-tameness]. 
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2.1. Definition of dynamical sources. A dynamical source, defined in ||25]| is closely related to a 
dynamical system on the interval. 

Definition 2 A dynamical system of the interval J" := [0, 1] is defined by a mapping T : J" ^ J" (called 
the shift) for which 

(a) there exists a finite alphabet £, and a topological partition of J" with disjoint open intervals {^^nijm^Tj^ 
i.e. J' = Umez^ m- 

(b) The restriction ofT to each J'^ is a bijection from J'm to T{,J^m)- 

The system is complete when each restriction is surjective, i.e., T{^rn) = The system is Markovian 
when each interval T{^m) is a union of intervals J'j. 

A dynamical system, together with a distribution G on the unit interval =y , defines a probabilistic source, 
which is called a dynamical source and is now described (See also Fig.l at the end). The map T is used 
as a shift mapping, and the mapping T whose restriction to each J^„, is equal to m, is used for coding. 
The words are emitted as follows [see Figure 3]: To each real x, (except for a denumerable set), one 
associates the trajectory S'ix) = (x, r(x), r^(x), . . . r^(x), . . . ), which gives rise, via the mapping z to 
the wordM(x) G l'^, 

M(x) = (/Mi(x),m2(x), . . . ,/M„(x), . . . ) with niy(x) = T(r-'^^(x)). 

Given a prefix w G the set of all reals x for which the word M(x) begins with the prefix w is 
an interval, the fundamental interval associated to w, and the measure of this interval (with respect to 
distribution G), is the fundamental probability p„ of the source. In the case of a complete system, one 
denotes by /ij^j the local inverse of T restricted to J*^„ and by the set := , m G £} of all local 
inverses. Each local inverse of the ^-th iterate T^ is then associated to a word w = m\m2 . . .mjt G it is 
of of the form h\^„^^ := o /jj,^^] . . . , and 

^v. = /jm(^), Pn. = \G{h[„]{l))-G{h[„]{0))\. (8) 

The set of all the inverse branches of T*^ is = {h^^y, w G S*^}. For h G J^f'^, the number k is called 
the depth of h and it is denoted by p{h). We denote by * := Uk>oJ^'^ the set of all inverse branches. 

Such sources may possess a high degree of correlations, due to the geometry of the branches and also to 
the shape of branches. 

The geometry of the branches is defined by the respective positions of "horizontal" intervals with 
respect to "vertical" intervals := T{^i) and allows to describe the set J/^^ formed with symbols 
which can be possibly emitted after symbol m. The geometry of the system then provides a first access 
to the correlation between successive symbols. In particular, in a complete system, any symbol of £ can 
be emitted after any symbol m, and thus the equality = ^ always holds. 

The shape of the branches, and more precisely, the behavior of derivatives |/z',J has also a great influence 
on correlations between symbols. For a fixed geometry of the branches, a system with affine branches is 
"less correlated" than the other systems with the same geometry. The contraction properties of J^, (i.e., 
the fact that < 1) are also essential, since they give rise to chaotic behaviour of the trajectories. 

2.2. Simple sources viewed as dynamical sources. All memoryless sources and all Markov chain 
sources belong to the general framework of dynamical sources and correspond to a piecewise linear 
shift, under this angle of dynamical sources. For instance, the standard binary system is obtained by 
r(x) = {2x} ({•} is the fractional part). More precisely: 
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- A memoryless source is a complete dynamical source, with affine branches and a uniform initial dis- 
tribution, 

- A Markov chain is a Markovian dynamical source, with affine branches and a family of uniform initial 
distributions on each ^j. 

Figure 3 shows three instances of simple sources, viewed as dynamical sources. 

However, as soon as the derivatives h' of the branches are not constant, there exist correlations between 
successive symbols, and the dynamical source is no longer simple. Dynamical sources with a non-linear 
shift allow for correlations that depend on the entire past. A main instance is the dynamical source 
relative to the Gauss map, represented in Figure 3, which underlies the Euclid Algorithm and is defined 
on the unit interval via the shift T 



r(o) = o, T{x) 



1 



1 



(x/O). 



(9) 









Figure 3: (Up) A dynamical system, with E = {a,b,c} and a wordM(x) — {c,b,a,c . . .) - (Down) Two memoryless 
sources and a Markov chain, viewed as dynamical sources. The continued fraction source. 



2.3. Transfer operators. One of the main tools in dynamical system theory is the transfer operator 
introduced by Ruelle, denoted by H^. It generalizes the density transformer H that describes the evolution 
of the density. 

We here consider the case of a complete dynamical system: if / = /o denotes the initial density on J^, 
and /i the density on after one iteration of T, then /i can be written as /i = H[/o], where H is defined 
by 

H:=^H(,) with H^,)[f]{x):=\h'{x)\foh{x). 



The transfer operator extends the density transformer; it depends on a complex parameter s, 
H.= >' H,M. with H(/,),,[/](x) := |/j'(x)|-^-/o/j(x), 



(10) 
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and coincides with H when s = \. Here, we are interested by generating the fundamental probabilities, 
whose expression is provided in ^ in the case of a complete dynamical system. The main tool is a 
generalized version of the transfer operator -the secant transfer operator- introduced by Vallee in [,25]| . 
This operator involves the secant function of inverse branches (instead of their derivatives), it acts on 
functions F of two variables; for s £C, and h G J^, we first define the component secant operator H^;,)^ 
as 

h{x)-h{y) 



and the secant transfer operator is defined as 



-y 



F{h{x)My)), 



(11) 



(12) 



Denote by diagF the function defined by diagF(;c) := F{x,x). The equality ]Hi [F](x,x) = Hs [diagF](x) 
holds on the diagonal x = y and shows that the secant operator is an extension of the plain transfer 
operator. Moreover, multiplicative properties of secants then entail the relation 



he.jei< 



so that m^,[F]{x,y) 



h{x)-h{y) 



F{h{x)My))- 



Finally, the Dirichlet series can be expressed as a quasi-inverse of the secant operator: this is a nice 
extension of the expressions obtained for simple sources, in (3|4 1. 



Proposition 2 [Vallee]. For a complete dynamical source, relative to a shift T and a distribution G, the 
Dirichlet series of the source admits an alternative expression which involves the quasi— inverse of the 



secant operator, defined in (12 1 applied to the function L\ where L is the secant of the distribution G, 



A(.) = (/- 



with L{x^y) :- 



G{x)-G{y) 



2.4. Tameness of dynamical sources. Here, we consider subclasses of dynamical sources, for which 
the quasi-inverse has nice spectral properties. This will entail, with Proposition 2, nice properties for the 
function A{s), from which one deduces tameness properties. The main results are as follows: There exist 
natural instances of dynamical sources which are S-tame, or H-tame. A "random" dynamical source is 
"very often" S-tame: this happens as soon as its inverse branches have "not too often" the same "shape". 
A dynamical source can be periodic only if it "closely resembles" a memoryless source. A dynamical 
source is H-tame if, informally speaking, its arithmetical properties are the same as the arithmetical 
properties of a H-tame memoryless source. More precisely, we define three (large) subclasses of dynam- 
ical sources - the Good Class, the UNI Class, the DIOP Class- for which we can describe the tameness in 
an informal setting. The UNI Class has been already studied and described in previous works ||5][I]|2l|3l. 
The original part of our work is related to the DID? Class, for which we revisit and extend previous results 
described in |[6l[T8l[l9j|. We first state the main tameness results for dynamical sources in an informal 
way: 

Theorem. All the sources of the Good-UNI Class are S-tame. All the sources of the Good-DIOP Class 
are H-tame. A source of the Good Class may be periodic only if it is conjugated to a source with affine 
branches. 

2.5. The Good Class. We first define the Good Class, for which the shift is expansive, and gives rise to a 
chaotic behaviour for the trajectories. 
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Definition 3 [Good Class]. A dynamical system of the interval ( J^, T) belongs to the Good Class if it is 

complete, with a set M' of inverse branches which satisfies the following: 

(Gl) The set .3^ is uniformly contracting, i.e., there exists a constant p < I, for which 



(G2) There is a constant A > 0, so that every inverse branch h G M' satisfies \h"\ < A\h'\. 
(G3) There exists Oq < I for which the series Y,he,^ Ph converges on ^s > (Jq. 

The essential condition is (Gl). The bounded distortion property (G2) and the property (G3) are techni- 
cal conditions that always fulfilled for a finite alphabet £. 

When the dynamical system belongs to the Good Class, the transfer operators (tangent and secant) act on 
spaces of functions of class. They admit dominant spectral properties for s near the real axis, together 
with a spectral gap. This implies that, for s near 1, the function A{s) is meromorphic for s with a small 
imaginary part, and admits a simple pole at = 1 . 

2.6. The UNI Condition. One first defines a probability Pr„ on each set J^" x Jif", in a natural way, 
and lets Fr„{{h,k)} := \h{^)\ • \k{^)\, where \^\ denotes the length of the interval Furthermore, 
A{h,k) denotes the "distance" between two inverse branches h and k of same depth, defined as 



The distance A{h,k) is a measure of the difference between the "shape" of the two branches h,k. The 
UNI Condition, stated as follows f5l, is a geometric condition which expresses that the probability that 
two inverse branches have almost the same "shape" is very small: 

Definition 4 [Condition UNI]. A dynamical system (J^,r) satisfies the UNI condition if its set of 
inverse branches satisfies the following 

{U\) For any a g]0, 1 [, and for any integer n, one has Pr„ [ A < p""] « p™. 

{U2) Each h G M' is of class and for any n, there exists B^for which \h"'\ < Bn\h'\for any h € Jif". 

For a source with affine branches, the "distance" A is always zero, and the probabilities of Assertion 
(Ul) are all equal to 1. Such a source never satisfies the Condition UNI. Conversely, a dynamical source 
of the Good-UNI Class cannot be conjugated to a source with affine branches, as it is proven by Baladi 
and Vallee [1]. Then, the condition UNI excludes all the simple sources, which cannot be S-tame. The 
strength of the Condition UNI is due to the fact that this condition is sufficient to imply strong tameness : 

Theorem 4 [Dolgopyat, Baladi-Vallee, Cesaratto-Vallee] When the dynamical system of the Good Class 
satisfies the condition UNI, it gives rise to a S-tame source. 

There are natural instances of sources that belong to the Good-UNI Class, for instance the Euclidean 
dynamical system defined in (|9]), together with two other dynamical systems, of the Euclidean type. 

2.7. The diophantine conditions. The Good-UNI Class gathers systems which are quite different from 
systems with affine branches. The DIOP Condition "copies" the behaviour of memoryless sources, when 
they are H-tame. In this case, we recall that there exists a ratio logpi/logpk which is diophantine, i.e., 
whose irrationality exponent is finite. 



V/iGjr, jS/,:=sup{|/i'(x)|; xe J^}<p. 



A{h,k) = inf \%^{x)\ with ^>h^u{x) = log 




(13) 
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The DIOP condition is an arithmetical condition, which extends this condition to a system of the Good 
Class. For an inverse branch h, one denotes by h* its unique fixed point (such a point exists and is unique 
for a system of the Good Class), by p{h) its depth, and one lets, for h,k,i in J^*, 

We can now state the definition of diophantine dynamical sources: 

Definition 5 [DI02 and DI0P3]. A dynamical source is 2— diophantine ( [DI0P2] in shorthand) if there 
exist two branches h etk of for which the ratio c{h,k) is diophantine. 

A dynamical source is 3— diophantine ( [DI0P3] in shorthand) if there exist three branches h, k and i of 
Jif* for which the ratio c{h,k,l) is diophantine 

The following result proves that these conditions are sufficient to entail H-tameness of associated sources. 
This is the main contribution of Roux' PhD thesis ll22l . The appendix contains hints on the proof, that 
will be detailed in the long version. 

Theorem 5 [Dologopyat, Naud, Melbourne, Roux- Vallee] 

(a) A dynamical system of the Good Class, which is moreover DI0P3, gives rise to a H—tame source. 

[b) A dynamical system of the Good Class, which is moreover DI0P2, gives rise to a H—tame source. 

2.8. A little piece of history. Dolgopyat, in two seminal papers IHO, introduces the Conditions UNI and 
DI0P2. He proves that, under these conditions, the quasi-inverse of the plain (tangent) transfer operator 
has nice properties in a region on the left of the line {Sis =1}: when the UNI Condition holds, the region 
is a vertical strip, and when the DI0P2 Condition holds, the region is of hyperbolic type. However, he 
does not consider the case of an infinite number of branches, and his results are extended to this case by 
Baladi and Vallee in (T.T\ for the UNI condition, and by Melbourne (W\ in the case of the DIOP condition, 
who introduces the DI0P3 Condition. However, in order to deal with the Dirichlet series A{s), one needs 
to extend the previous proofs to the secant operator. This have been done by Cesaratto and Vallee in 
|[3l for the UNI Condition. Here, we deal with the DIOP conditions and we perform two extensions: 
we consider a possible infinite alphabet, we deal both with the DI0P3 (where we use a method due to 
Melbourne |18|) and the DI0P2 condition (where we use a method due to Naud Iil9il ). We also extend 
these results to the secant operator. 

Acknowledgements. This work takes place inside the ANR project MAGNUM [Methodes Algorithmiques 
pour la Generation Non Uniforme: Modeles et Applications] [ANR 2010 BLAN 0204] . 
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3 Some hints on the proof of Theorem 5. 

Since the Dirichlet series A{s) is expressed with the quasi-inverse of the secant operator we study the 
behaviour of this quasi-inverse on the vertical line 91j = 1. It is closely related to the behaviour of the 
operators M,,M, defined by 



M,[f]{x) := \T'{x)\''foTix), M,[F[{x,y) 



T{x-T{y) 



x-y 



It 



F{T{x),T{y)). 



3.1. Various possibilities for the spectral radius of the operator on 91^ = 1. The beginning point 
is the following proposition, that is classical for the tangent operator, and can be easily extended to the 
secant operator. 

Proposition 3. Consider a dynamical system of the Good Class andits secant transfer operatorMg, acting 
on the space ( x J^) for a parameter s of the form s=l + ito, with to ^ 0. 

(a) For a complex number X, of modulus 1, the two conditions are equivalent: 

(al) The complex number X belongs to the spectrum SpEIi+,>g . 
(a2) The complex number is an eigenvalue ofMt^. 

(b) Assume that there exists to y^O for which the condition {a2) is satisfied. Then, there exist a^O and 
b for which the quantities c{h) — b all belong to the Z-module Za. 

(bl) IfX is a root of unity, then all the ratios c{h,k) are rationals. 

(bl) If A is any complex number of modulus 1, all the ratios c{h, k, t) are rationals. 

(c) If one of the two conditions is satisfied 

(cl) there exists a ratio c{h,k) which is not rational, 

(c2) For any t ^Q, the spectrum of the operator Mj does not contain A = 1. 
then, the quasi-inverse (I — Hy) ^ ' is analytic on Sis = 1 except ats=l where it has a simple pole. 

(d) If one of the two conditions is satisfied 

[dl ] there exists a ratio c{h,k, i) which is not rational, 

{dl) For any ? / 0, the spectrum of the operator Mj does not contain any A with | A | = 1, 

then, the spectral radius of Ms is strictly less than 1 on {s;3is = l,s 1} and, for any A of modulus 1, the 
quasi-inverse (/ — AH^)" ^ is analytic on the line 3is=l except ats = 1 where it admits a simple pole. 

3.2. Reinforcement of conditions (c), {d). The main question is now as follows: if one of the conditions 
(cl ) or (Jl ) or (c2) or (dl) is replaced by a stronger condition, is it possible to obtain a conclusion about 
tameness, of the following kind: 

(/?) There exists a region on the left of the vertical line Sis = I on which the quasi-inverse (/ — H.,)"^ is 
analytic except ats=l (where it admits a simple pole), and is of polynomial growth for t = 3s ^ oo. 

We deal here with the Banach space x ^) formed with functions of class on the unit square, 

endowed with the norm 1 1 . 1 1 1 defined by 1 1 m 1 1 1 : = sup | m (x, y | + sup 1 1 m' (x, y ) 1 1 , but we also use a norm 1 1 . 1 1 (() 
which depends on the imaginary part ? of defined by ||M||(f) := sup|M(x,y| -|-(l/|?|)sup||M'(x,y)||. Our 
main object of study is 

M{t):= {I-Ui+u)-' (14) 
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A possible reinforcement DI0P3 of the condition {dl) is "There exists a triple {h,k,£) for which c{h,k,i) 
is diophantine". A possible reinforcement (d3) of the condition {d2) is: "The operator M; does not admit 
a system of almost eigenfunctions" for which a more formal statement will be provided later. We will 
also see that these two reinforcements are not independent since the implication DIDP3 =^ {d3) holds 

A possible reinforcement DI0P2 of the condition (cl) is "There exists a pair {h,k) for which c{h,k) is 
diophantine". A possible reinforcement (c3) of the condition (c2) is: "The operator M, does not admit a 
system of almost invariant functions" for which a more formal statement will be provided later. We will 
also see that these two reinforcements are not independent since the implication DI0P2 (c3) holds 



3.3. Precise statement of Theorem 5. There are two theorems, one for each condition DI0P2 or 
DI0P3. 

Theorem 5(a). [DI0P3] Consider a dynamical source of the Good Class, with a possibly infinite denu- 
merable alphabet, with a contraction ratio p <l. If there exists a triple {h,k, £}, with v = max{c(/z) ,c{k), 
c{£)}, for which c{h,k,£) is diophantine with exponent }X, then M{t) is of polynomial growth, with an 
exponent strictly larger than 

2^ + 4 



4H + 3 + V- 



logPl 



Theorem 5{b). [DI0P2] Consider a dynamical source of the Good Class, with a possibly infinite denu- 
merable alphabet, with a contraction ratio p < 1 and a pression function s i— )• Z.(5')|^ Consider the real Vi 
defined from the pressure function by the two equations 

(1 - ai)L'(ai) +L(ai) = logp, Vi = -L'(ai). 

If there exists a pair {h,k}, with Vq = max{c{h),c{k)}, for which c{h,k) is diophantine with exponent fl, 
then ^{t) is of polynomial growth, with an exponent strictly larger than 

4/1 + 3 + v ^^"*"^ with V = max(vo, Vi). 
|logp| 



3.4. Main sets of interest. One considers triples ^IV formed with 
(/) a subset ,9 of the set {t G M, \t\ > 1}, 

(//) a family of functions, W:={wt,te^; G ^^(J^ x J^), \wt\ = l, | |w,| |(;) < i^}, 

(///) a family rj of complex numbers, Tj := {tj^ G C, f G = 1}. 

We consider properties which are satisfied only on subsets of the unit square x and only in an 
approximative way, and, for a given imaginary part t, these subsets, and the approximation will depend 
on t (in a polynomial way), and there are various parameters (a, j8, 7, 5) for the possible exponents. 

One lets ?i(j8,?) := [j8log|f|], ?i(0,f) := [01og|f|] and considers the following subsets of J^*, 

The following subsets of x ^ are called "fundamental unions" 

J^{t,l5,d):= U h{j^)xh{j^), jF{t, 15,5,6):= [j /io£(^) x /jo^(^). 

he.%^'(t3.S) he.x-{i,p,s) 

^the pression is the logarithm of the dominant eigenvalue X{s) 
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In the proof, there are various subsets which intervene: Subsets £/, related to the notion of "almost 
eigenfunctions" - subsets relared to the notion of "almost invariant functions" - subsets S" which 
approximate subsets \ - subsets ^ related to the behaviour of the iterate of the operator - Subsets 
^ related to the growth of the quasi-inverse of the secant operator. The final subset of interest is the 
subset and the other ones form a chain of subsets which will be compared to ^ in the proof. The 
first three ones involve the approximate subset J^(f , jS , 5, 0). 

The set £/ (a, j8, 5, 0) gathers all the reals t for which there exists a pair (w^, T]?) that satisfies, 

\Mf'\wr]{x,y)-ri,Wr{x,y)\<-l^, for any (x,3^) G ^(?,/3, 5, 0). (15) 

The set '^(ot, j8, 7, 5, 0,^o) gathers all the reals t for which there exists a pair (w,, T]f) that satisfies 

|T]^-l|<i. \mf'\w,]ix,y)-ri,w,{x,y)\<^, for any {x,y) e J^{t, 15,6,6). (16) 



The set S'{a, [5, Y, 5,6, ko) gathers the reals t for which there exists a pair (w^, T],) that satisfies 

|TJ^-1|>1 \mf'\wt]{x,y)-ri,wt{x,y)\<^, for my {x,y) e J^{t, 15,5, 6). (17) 
The inclusion i;/(a,j8, 5, 0)\^(a,j8, 7,5, 0,yco) C S'{a,p,Y,5,6,ko) holds. 

Let Y be the invariant function of Hi . The set =^(a, j3 , 6) gathers all the reals t for which there exists Uf, 
with ||M?||(f) < 1, that satisfies, for any n < 3n{[5,t) and any {x,y) e J^(f,0), 

\m'l^,[Yu,]{x,y)\>Y{x,y)(^l-^ 

The set .^{a) gathers the reals t for which the quasi-inverse is of polynomial growth with exponent a 

^{a):={t, ^{t)<t"}. 
We wish to prove that there exists a for which ^''(a) is bounded. 

3.5. Relation between diophantine properties, subsets £/ and There are two main results, de- 
scribed in Lemma (with subset £/) and Lemma 1 (with subset '^). 

Lemma 0. Consider a triple {h,k,t) G and a real a > I. If Aere exists a triple {^,5,6) with 

->max{cih),c{k), €{£)}, (18) 

for which £/ {a , (5 , 5 , 6) is unbounded, thenc{h,k,i) has an irrationality exponent at least equal to a — I . 
If c{h,k,tj is diophantine with exponent /i, then for any 4-uple {a, [5, 5, 6) avec a > + I, et (P,5) 
which satisfies {18 ), the subset £/{a,p ,5,6) is bounded. 

Lemma 1. Consider a pair {h,k) G .J^^ and a real jj. > I. If there exists a 6-uple (a, j8, 7, 5,6,ko) with 
min(a, 7) = /i, and 

->max{c{h),c{k)}, (19) 
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for which the subset '^{a,P,Y, 5,6,ko) is unbounded, then c{h,k) has an irrationality exponent at least 
equal to /i — 1 . 

Ifc{h,k) is diophantine with exponent pL, then, for each 6-uple (a, jS, 7, 5, 0,^o) with min(a, 7) > + 1 
et{P,5) that satisfies \19), the subset 'i^ {a, p,Y, 5,6, ko) is bounded. 

In the following of the proof, we use the notion of weak inclusion between two subsets ^ et ^ de M. 
The subset ^ is said to be weakly included in ^ [this is denoted by ^ C ^] if there exists ti G M for 
which ^n[ti, +00 [c n [ti , +00 [. 

3.6. Relations between subsets £/ and J^. Lemma 2 compares subsets =^ and £/ whereas Lemma 3 
compares subsets ^ and J^. Lemmas 2 and 3 are summarized in Lemma 4 which compares subsets £/ 
and J^. Lemmas and 4 together prove Theorem 5(a). 

Lemma 2. For any 4-uple {a, 15,5,6) that satisfies j8 1 log p\> a + \, 

the weak inclusion 3S{a\,^,6) C .<2/{a,P,5,6) holds, for any ai >2a + 5. 

Lemma 3. For any triple (ai , j8 , ) that satisfies 6 \ log p | > oti + 1 , 

the weak inclusion SS'^{ai,fi,5) C =^(0:2) holds for any tti > 2a\ + 1 

Lemma 4. For any 4-uple {a, [5, 5, 6) that satisfies j8| logp| > a + 1, 0|logp| > 2a + 5 + 1, 
the weak inclusion ^'^'{aj) C s^{a,^,5,6) holds, for any a2 > 4a + 25 + 1 

3.7. Relation between subsets S' and This relation is described in Lemma 5. 

Lemma 5. One considers the logarithm L{s) of the dominant eigenvalue ?i{s) of the operator and 
the real Vi defined from the pressure function by the two equations 

(l-ai)L'(c7i)+L(cyi)=logp, Vi = -L'(ai). 

For any 5-uple {a,^,Y,5,6) which satisfies the relations 

«^ a + 1 5 
|logp| /3 

there exists an integer ko for which the weak inclusions 
S'{a,p,Y,5,6,ko) C ^(2a), andthus ^'{2a) c j^^(a,j8,5,0)U^(a,j8,7,5,0,A:o) hold. 

3.8. Relation between subsets 'I0 and One gathers the conclusions of Lemmas 4 and 5 in Lemma 6. 
Lemmas 1 et 6 together prove Theorem 5{b). 

Lemma 6. One considers the logarithm L{s) of the dominant eigenvalue X{s) of the operatorM.s and the 
real Vi defined from the pressure function by the two equations 

(l-ai)L'(ai)+L(ai)=logp, Vi = -L'{a,). 

For any 5-uple {a,^,Y,5,6) which satisfies the relations 

a + 1 ^ 2a + 5 + 1 5 
|logp| |logp| /3 

there exists an integer k^ for which the weak inclusion 

^%2a + 5) c'^ia, 15, Y,5, 6, ko) holds. 



